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ABSTRACT 


n 

The  nonlinear  problem  is  an  unconstrained  optimization 

problem  whose  objective  function  is  not  differentiable  everywhere,  and 
hence  cannot  be  solved  efficiently  using  standard  techniques  for 
unconstrained  optimization.  The  problem  can  be  transformed  into  a 
nonlinearly  constrained  optimization  problem,  but  it  involves  many 
extra  variables.  We  show  how  to  construct  a  method  based  on  projected 
Lagrangian  methods  for  constrained  optimization  which  requires  successively 
solving  quadratic  programs  in  the  same  number  of  variables  as  that  of 
the  original  problem.  Special  Lagrange  multiplier  estimates  are  used 
to  form  an  approximation  to  the  Hessian  of  the  Lagrangian  function, 
which  appears  in  the  quadratic  program.  A  special  line  search  algorithm 
is  used  to  obtain  a  reduction  in  the  objective  function  at  each 

iteration.  Under  mild  conditions  the  method  is  locally  quadratically 
convergent  if  analytical  Hessians  are  used. 

\ 


A  PROJECTED  LAGRANGIAN  ALGORITHM  FOR  NONLINEAR  ^  OPTIMIZATION 
Walter  Murray  and  Michael  L.  Overton 


1.  Introduction 

The  problem  we  wish  to  solve  is 

Jl^P  :  min{F^(x)|x  €  Rn} 
x 

m 

where  F^x)  ■  )  I  f  ±  (x)  | 

i“l 

and  the  functions  f^cR11  ■+■  R^  are  twice  continuously  differentiable. 

The  function  F^(x)  is  called  the  function  and  i.^P  is  referred 

to  as  the  problem.  The  problem  is  an  unconstrained  optimi¬ 

zation  problem  in  which  the  objective  function  has  discontinuous 
derivatives  and  hence  it  is  Inappropriate  to  use  a  standard  uncon¬ 
strained  minimization  method  to  solve  it.  The  problem  is  equivalent 
to  the  following  nonlinearly  constrained  problem  in  which  both  the 
objective  and  constraint  functions  are  twice  continuously  differ¬ 
entiable: 

m 

ELP  :  min{  £  u  |x  €  Rn,  u  €  Rm} 

i-1  1 

subject  to  c^a^  (x,u)  j>0,  i  ■  1,2,...,  m;  a  *  -1,1, 
where  c|CT^  (x,u)  ■  u^  -  af^(x)  . 
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We  could  solve  ELP  using  a  method  for  the  general  nonlinear  programming 
problem,  but  this  is  very  unattractive  since  m,  the  number  of  extra 
variables,  may  L?  large.  A  method  can  be  derived  which  exploits  the 
special  structure  of  problem  ELP  essentially  reducing  it  back  to  a  problem 
with  n  variables.  One  special  feature  of  ELP  is  that  the  2.^  function 
is  a  natural  merit  function  which  can  be  used  to  measure  progress 
towards  the  solution  of  ELP.  Such  a  merit  function  is  not  generally 
available  for  the  nonlinear  programming  problem  without  the  introduction 
of  a  parameter  such  as  penalty  parameter. 

The  method  we  adopt  to  solve  i^P  consists  of  two  parts  at 
each  iteration:  (1)  obtain  a  direction  of  search  by  solving  and 
perhaps  modifying  a  quadratic  program  based  on  a  projected  Lagrangian 
algorithm  for  ELP,  and  (2)  take  a  step  along  the  search  direction 
which  reduces  the  function.  The  general  approach  is  similar  to 

that  described  for  the  minimax  problem  by  Murray  and  Overton  [15]. 

The  structure  of  the  quadratic  program  to  be  solved  is  however  con¬ 
siderably  different  from  the  minimax  case  and  this  is  described  in  full 
in  subsequent  sections.  We  use  a  special  line  search  algorithm  which 
is  closely  related  to  the  one  used  in  the  minimax  case.  This  is 
discussed  in  Section  7;  the  details  may  be  found  in  [14]. 

A  number  of  other  algorithms  have  been  proposed  for  solving  the 
nonlinear  problem.  These  will  be  discussed  further  in  Section  9, 

after  our  algorithm  has  been  described  in  full.  At  the  time  of  this 
writing, no  other  algorithms  related  to  projected  Lagrangian  methods 
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using  second  order  information  have,  to  our  knowledge,  been  published. 
However,  Bartels  and  Conn  are  currently  doing  some  related  work. 

We  note  that  no  convexity  assumptions  are  made  about  the  functions 
f^(x).  We  concern  ourselves  only  with  local  minima. 


1.1.  Notation 

All  vectors  are  column  vectors,  but  for  convenience  we  will 
/x\  *  * 

write  (x,u)  for  I  I.  Define  (x,u)  to  be  a  solution  of  ELP.  It 
* 

follows  that  x  is  a  solution  to  i^P  and 


m 


-  I 

i-i 


fk)  (k)  *  * 

Let  (xv  ,uv  ')  denote  the  k-th  approximation  to  (x,u) . 

(k+1)  fk+1} 

At  each  iteration  of  the  algorithm  (x  ,u  )  is  obtained 
by  setting 


(k+1)  _  (k) 


+  up 


and 


.(k+1) 


If  (x 


(k+1) 


)l 


where  p  is  the  direction  of  search  in  Rn  and  u,  a  positive  scalar, 
is  the  steplength,  and  the  absolute  value  of  a  vector  denotes  the 
vector  of  the  absolute  values  of  the  components.  Note  that  this  choice 
of  (x^k+*\  u^+1^)  immediately  guarantees  that  all  the  points  { (x^,u 

are  feasible  for  ELP,  i.e.  c^[o)(x(k))  >  0,  i  -  1, . . .  ,m,  o  -  -1,+1. 

It  also  follows  that  for  each  i,  at  least  one  of  the  pair  of  constraints 
(c^  c<+1))  must  have  the  value  zero.  We  will  be  interested  in 

the  case  where  the  other  constraint  in  the  pair  is  also  zero  at  the 


(k) 


solution,  i.e.  the  corresponding  function  is  zero.  Therefore  at 

any  point  x  we  define  the  active  set  of  functions  as  those  which  we 

* 

think  will  have  the  value  zero  at  the  solution  x,  based  on  the  infor¬ 
mation  at  x.  This  set  will  usually  include  all  functions  with  the 
value  zero  at  the  point  x  and  may  also  include  some  with  nonzero 
values.  The  exact  procedure  for  selecting  the  active  set  at  each  iteration 
will  be  discussed  in  Section  8  and  procedures  for  modifying  this  choice 
will  be  discussed  in  Section  5. 

We  define  t  (=  t(x))  to  be  the  number  of  active  functions  at 

x  and  write  the  vector  of  active  functions  as  f(x)  6  R*" ,  Define 

*  th 

E(x)  to  be  the  diagonal  square  matrix  of  order  t  whose  i —  diagonal 

component  is  1  if  f^(x)  0  and  -1  otherwise.  Define  V(x)  to  be 

the  n  x  t  matrix  whose  columns  {v^(x)}  are  the  gradients  of  the 
active  functions.  Similarly  we  define  f(x)  €  Rm  t  to  be  the  vector 
of  inactive  functions  at  x  and  define  E (x)  and  V(x)  to  be  respec¬ 
tively  the  (m-t)  x  (m-t)  diagonal  matrix  of  the  signs  corresponding  to 
f(x)  and  the  n  x  (m-t)  matrix  of  gradients  of  the  inactive  functions. 

We  also  define  u  and  u  to  be  the  subvectors  of  u  corresponding  to 
f  and  f . 

We  define  the  active  constraints  at  x  to  be  both  constraints 
of  each  pair  corresponding  to  the  active  functions  plus  the  one  constraint 
with  zero  value  of  each  pair  corresponding  to  the  inactive  functions. 

We  can  order  the  active  constraints  so  that  the  vector  of  active  constraint 
values  is  given  by 
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u  -  I(x)  f(x) 


c(x)  *  u  -  f(x) 
u  +  f(x) 


with  u  •  £(x)  f(x),  u  *  E(x)  f(x),  by  definition  of  u.  Define  A(x) 
to  be  the  (nrt-n)  x  (m+t)  matrix  whose  columns  {a^}  are  active 
constraint  gradients.  We  can  order  the  variables  (u^)  so  that 


A(x)  = 


-V (x)  E(x)  -V(x)  V (x) 


Here  lg  is  the  identity  matrix  of  order  s. 

We  define  Y(x)  and  Z(x)  to  be  orthogonal  matrices  respectively 

A  A. 

spanning  the  range  and  null  spaces  of  V(x).  Provided  V(x)  has  full 
rank  we  have  that  Y(x)  has  dimension  n  x  t,  Z(x)  has  dimension  n  x  (m-t). 


Y (x)  Y(x)  -  It, 


Z(x)  Z(x)  «=  In_t  , 


Y (x)T  Z(x)  -  V(x)T  Z (x)  =  0  . 


Let  g  be  the  gradient  of  the  objective  function  of  ELP,  i.e. 


the  (n-Hn)-vector: 


8  ■  « 


^  ^  ^  t 

where  e  €R  and  e  €R  are  vectors  of  all  ones. 


The  Lagrangian  function  associated  with  ELP  is 


L(x,u,X)  =  eTu  +  e^u  -  X^c(x) 

where  X  6  Rm+t  is  a  vector  of  Lagrange  multipliers.  The  gradient  of 
L(x,u,X)  with  respect  to  x  is  g  -  AX.  Define  W  to  be  the  Hessian 
of  the  Lagrangian  function  with  respect  to  (x,u).  Then 


W 


E 


where  W  is  the  Hessian  of  L(x,u,X)  with  respect  to  x  only,  i.e. 


m+t 


W(x,X)  *  l  X47$  GO 
i=l 


Define  Z  to  be  an  orthogonal  matrix  spanning  the  null  space  of  A, 

i.e.,  A  Z  =  0.  It  follows  from  the  definition  of  A  that  the  first 
b 

n  rows  of  Z  can  be  taken  to  be  Z,  the  matrix  which  is  orthogonal 
b 

to  V.  Thus  Z„W  Z„,  the  Hessian  of  L(x,u,X)  projected  into  the 
b  b  b 


null  space  of  A,  can  also  be  written  as  Z  WZ. 


We  will  use  p„  to  denote  a  vector  in  R 
b 


n+m 


whose  first  n 

components  are  the  direction  of  search  vector  p.  We  write 

/  P 
P 

\  P 


where  p  and  p  correspond  to  u  and  u. 
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Often  we  will  omit  the  arguments  from  the  various  vectors  and 

matrices  f,  V,  A  etc.  when  it  is  clear  that  they  are  evaluated  at 

(k)  *  *  *  - 

x  .  We  use  the  notation  A,  V,  Z  to  denote  A,  V,  Z  evaluated  at 

A  A 

(x,u)  with  the  active  set  of  functions  correctly  chosen,  i.e.,  con¬ 
sisting  of  all  those  functions  with  the  value  zero  at  x. 


1.2.  Necessary  and  Sufficient  Conditions 

k  k 

The  necessary  and  sufficient  conditions  for  (x,u)  to  be  a 

k 

local  minimum  of  problem  ELP  and  therefore  x  to  be  a  local  minimum 
of  problem  are  simplifications  of  the  general  necessary  and 

sufficient  conditions  for  the  nonlinear  programming  problem.  A 
similar  argument  to  that  given  in  [15]  shows  that  the  first-order  con¬ 
straint  qualification  always  holds  for  ELP.  The  conditions  therefore 
reduce  to  the  following: 

First-order  necessary  condition 
*  *\ 

If  (x,u)  is  a  local  minimum  of  ELP  then  there  exists  a  vector 
^  ffi+t 

of  Lagrange  multipliers  X  €  r  such  that 

g-AX-0  and  X  >  0  .  (1.2) 


Second-order  necessary  condition 

A  A 

If  (x,u)  is  a  local  minimum  of  ELP  and  the  second-order  con- 

AT  AAA 

straint  qualification  holds,  then  Z  W(x, \)Z,  the  projected  Hessian  of 
the  Lagrangian  function,  is  positive  semi-definite. 
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Sufficient  condition 


If  the  first-order  necessary  condition  holds  at  (x,u),  the 

* 

Lagrange  multipliers  are  all  strictly  positive,  i.e.  X  >  0,  and 
*t  *■  *  *  *  * 

Z  W(x,X)Z  is  positive  definite,  then  (x,u)  is  a  strong  local  minimum 

* 

of  ELP.  Thus  in  terms  of  Jl^P,  F^(x)  <  F^(x)  for  all  x  such  that 

i  *  i 

|x-x|  <  6  ,  for  some  6  >  0. 

In  the  case  where  all  the  {f  }  are  linear  it  is  well-known  that 

a  solution  must  exist  with  n  active  functions  at  x.  Then  normally 
* 

the  matrix  Z  is  null  which  implies  the  second-order  conditions  are 

also  null.  The  nonlinear  problem,  however,  can  have  a  unique  solution 

* 

with  anything  from  zero  to  n  functions  active  at  x. 


2 •  Use  of  the  Equivalent  Problem  ELP 

At  every  iteration  we  wish  the  search  direction  p  to  be  a 
descent  direction  for  F^,  i.e. 

F^(x(k),p)  <  0  , 

(k) 

where  Fj(x  ,p)  is  the  directional  derivative 

lim  £  (F  (x(k)  +  hp)  -  F  (x(k)))  . 
h  +  0  1 

(k) 

It  is  easy  to  see  that  F|(x  ,p)  is  also  given  by 
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,  I 

ilf^O 


TfTT  V 


i|fi=0 


ivTpi 


(2.1) 


We  have  the  following: 


THEOREM  1.  If  p„  is  a  first-order  feasible  descent  direction  for  ELP, 

-  L 

i.e. 

gTPE  <  o  (2.2) 

and 

VCj°^  p„  >  0  for  all  i,  a  such  that  cf°^  =  0  ,  (2.3) 

then  p  is  a  descent  direction  for  and  hence  a  sufficiently  small 

step  along  it  must  result  in  a  reduction  in  . 

Proof.  Suppose  for  the  moment  that  the  active  set  consists  of  those 

00 

and  only  those  functions  which  are  zero  at  x  ,  so  that  we  can  use 

AT 

the  notation  developed  for  this.  We  then  have  A  p  >  0,  and  hence 

£j  — 

EV^p  +  p  ^  0 
-V^p  +  p  >_  0 

a  m  a 

V  p  +  p  >  0  . 

It  follows  from  (2.1)  and  (2.2)  that 

He)  -T-  "T" 

F|(xVK',p)  <  e  p  +  e‘p  <  0  .  □ 
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It  is  possible  for  p£  to  be  a  first-order  feasible  direction 
without  being  a  feasible  direction  for  ELP.  This  causes  no  difficulty 
since  u^+l)  is  set  to  |f(x^  ^)|  and  hence  it  is  always  possible 
to  obtain  a  lower  feasible  point  for  ELP  if  (2.2)  and  (2.3)  hold,  by 
reducing  along  p. 

A  second  desirable  property  for  p  arises  from  considering 

the  active  set  of  functions,  i.e.  those  we  expect  to  have  the  value 

* 

zero  at  x.  We  wish  to  choose  p  so  that  the  first-order  change  in 

these  functions  predicts  that  they  will  all  have  the  value  zero  at 
(k) 

x  +  p.  An  equivalent  condition  is: 

VTp  =  -f  .  (2.4) 


This  condition  is  implied  by  the  following  condition  on  the  (n+m)- 
vector  pE: 


(2.5) 


Strictly  speaking,  (2.5)  is  a  stronger  condition  than  the  pair  of 
conditions  (2.4)  and 

ATpE  >  -c  .  (2.6) 

However,  since  the  only  difference  is  that  the  variables  (u^)  are 
required  to  be  on  their  bounds  and  it  will  become  evident  later  that 
this  does  not  affect  the  choice  of  search  direction,  for  simplicity 
we  will  require  that  p  satisfy  (2.5). 

Thus  we  see  that  one  view  of  ELP  is  as  a  device  to  obtain  a 
search  direction  p  along  which  can  be  reduced  in  the  line  search. 
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We  emphasize  again  that  we  wish  (2.2)  and  (2.3)  to  hold  so  that  p 

is  a  descent  direction  for  F, ,  and  that  the  active  set  nature  of  the 

•«» .  »•  1 

algorithm  indicates  that  (2.4)  and  hence  (2.5)  should  also  hold. 


3.  Derivation  and  Solution  of  QP  Subproblem 

The  solution  of  ELP  is  at  a  minimum  of  the  Lagrangian  function 
in  the  null  space  of  the  active  constraint  Jacobian.  The  usual  method 
for  solving  a  general  linearly  constrained  problem  is  to  approximate 
the  objective  function  by  a  quadratic  function  and  then  determine 
the  search  direction  by  solving  some  appropriate  quadratic  program  (QP) . 
Consider  therefore  the  quadratic  program: 


QP1:  ~  p£wE(xW  ,X(k))pE  +  gTpE 

PE 

-  x 

subject  to  A  p  *  -c  , 

E 

oo  * 

where  X  is  an  estimate  of  X 

The  constraints  of  QP1  are  equivalent  to  (rearranging  equations) : 


-  SVTp  +  p  =  0 

A  H  m  A 

-  IV  p  +  p  -  0 

A  A  m  A  A  A 

EV  p  +  p  =  -2Ef 


(3.1) 


The  last  two  equations  imply  that 


(3.2) 


p  -  ivTp  =  -It 

T  _X-  “T' 

Since  gp£  =  ep  +  ep  it  follows  that  p  can  be  obtained  by  solving 
the  following  QP  in  only  n  variables: 

QP2:  min  pTW(x(k) ,X(k))p  +  iTfvTp 

P 

subject  to  VTp  =  -f  . 

In  order  to  solve  QP2  we  introduce  the  matrices  Y  and  Z 
defined  in  Section  1.1.  These  provide  bases  for  the  range  and  null 
spaces  of  V.  The  matrices  Y  and  Z  may  be  determined  from  the 
QR  factorization  of  V: 


where  R  is  an  upper  triangular  matrix  of  order  t.  If  V  has  full 
T 

rank  and  Z  WZ  is  positive  definite,  then  the  unique  solution  of  QP2 
may  be  expressed  as  the  sum  of  two  orthogonal  components 


p  -  Ypy  +  Zpz 

where  py  €  RC  and  pz  €  Rn-t.  We  have 


(3.3) 
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(3.4) 


*T  T 

V  P  “  R  py  *  -f 

and  pY  is  determined  entirely  by  the  constraints  of  QP1.  The  vector 
p  is  given  by  the  solution  of 

Ld 

(ZTWZ)pz  =  -ZT(VZi  +  WYPy)  (3.5) 

We  shall  aaso  wish  to  refer  to  the  related  QP  with  homogeneous 
constraints: 

QP3:  min  pTWp  +  eTZVTp 

P 

*T 

subject  to  V  p  =  0  . 

The  solution  to  this  is  given  by  p  =  Zq  ,  where 

z 

(ZTWZ)qz  -  -ZTVZe  .  (3.6) 

At  every  iteration  of  our  algorithm  an  attempt  is  made  to  set 
the  search  direction  p  to  the  solution  of  QP2,  but  for  various 
reasons  this  may  be  inadequate.  In  subsequent  sections  we  explain  what 
action  is  taken  in  these  circumstances. 


4.  Lagrange  Multiplier  Estimates 

(k) 

Let  us  first  suppose  that  x  is  a  minimum  on  the  manifold 
defined  by  the  current  active  set.  Then  for  some  X  we  have 

AX  -  g. 
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Writing  X  *  (A, A  ,X  )  this  is  equivalent  to 


-VZA  -  VA  +  VA  =0 


A  =  e 


Let  us  therefore  define 


A  +  A  *  e  . 


7T  =  A  -  A 


Equations  (4.1)  and  (4.2)  then  reduce  to 


Vtt  =  -VEe 


(4.1) 


(4.2) 


(4.3) 


(4.4) 


(4.5) 


It  follows  from  (4.3)  and  (4.4)  that  the  first  order  necessary  condition 


A  0  is  equivalent  to 


i  *  i 

1*1  <  1 


*  * 

where  v  is  tt  at  x.  The  vector  7r  here  plays  the  same  role  as  the 

vector  w  in  the  linear  case  described  by  Bartels,  Conn  and  Sinclair  [5]. 

The  question  we  face  in  this  section  is  how  to  define  the  vector 

(k) 

of  Lagrange  multiplier  estimates  at  any  point  x  ,  dropping  the 

assumption  that  it  is  the  minimum  on  the  manifold.  Multiplier  estimates 

are  needed  to  define  the  matrix  W  and  to  determine  whether  constraints 

should  be  deleted  from  the  active  set.  It  is  better  to  use  new  informa- 

(k) 

tion  obtained  at  the  current  point  x  rather  than  use  the  multipliers 
of  the  QP  solved  at  the  previous  iteration.  Such  an  estimate  should  in 
some  sense  approximately  satisfy  the  overdetermined  system  based  on  the 


first  order  necessary  conditions: 


AA  ~  g  . 


(4.6) 


-VWM  v 


Clearly  it  makes  no  sense  to  delete  an  active  constraint  corresponding 
to  an  inactive  function,  since  the  corresponding  variable  will  be 

reduced  at  the  end  of  the  iteration  to  make  the  constraint  active 
again.  Similarly,  only  one  active  constraint  of  the  pair  corresponding 
to  an  active  function  should  be  considered  for  deletion.  These  facts 
combined  with  the  fact  that  the  search  direction  is  being  determined 
for  a  QP  involving  only  n  variables  indicate  that  a  special  estimate 
taking  into  account  the  structure  of  ELP  should  be  used,  as  opposed  to 
the  least  squares  solution  of  (4.6). 

X^,:  The  special  estimate  for  ELP 

The  special  estimate  is  required  to  satisfy  exactly 

those  equations  in  (4.6)  which  are  exactly  the  same  as 

the  equations  holding  at  the  minimum  on  the  manifold  defined  by  the 

active  set.  Thus  we  define  Xc  to  be  the  least  squares  solution  to  the 
approximate  equation  (4.1),  subject  to  the  constraint  that  (4.2)  and 
(4.3)  hold  exactly.  Equivalently  we  can  define  ^  as  the  solution  of 
the  least  squares  problem 

minllVir  +  Vlell^  .  (4.7) 

It  is  then  not  necessary  to  explicitly  form  Xc>  since  checking  whether 

a  component  of  it  is  greater  than  one  in  modulus  is  equivalent  to 

*+  A- 

checking  whether  a  component  of  X  or  X  is  negative.  Furthermore, 
using  Xc  to  define  W  results  in 
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m-t 


u  -  T  <Jc>i"'i’2fi +  j.  +  i  -<;c>i'2'i  • V  ;A *  j/v  A 

1=1  1=1  1=1  1=1  1=1 


.  th 


be 


where  o.  is  the  i  diagonal  element  of  l .  Thus  tt  may  also 

1  V 

used  to  define  W  directly. 

The  vector  can  be  computed  directly  from  the  QR  factorization 

of  V  which  we  introduced  in  the  last  section  to  solve  QP2.  The  estimate 
is  a  first-order  multiplier  estimate  in  the  sense  defined  in  [15]. 

Because  at  every  iteration  computing  the  search  direction  involves 
only  the  first  n  variables  of  ELP,  the  multiplier  estimate  which 
is  relevant  to  predicting  whether  the  steepest  descent  step  in  a  sub¬ 
space  of  Rn  will  be  first-order  feasible  is  X^,,  not  the  least  squares 

solution  to  (4.6).  Let  us  suppose  that  (it  )  .  >  1  and  that  the  con- 

L  J 

straint  u^  >  -f  (x)  is  to  be  deleted  from  the  active  constraint  set 
(i.e.  fj  is  to  be  deleted  from  the  active  function  set).  Define  V 
as  V  with  Vj  deleted,  and  Z  by 


«  -  °'  ***  ■  \-t*V  1  -  ‘z  '1  • 


(4.8) 


Now  consider  the  gradient  of  the  linear  term  in  QP2,  i.e.,  VEe.  Since 
Vj  is  being  deleted  from  the  active  set  it  should  be  included  in  the 
gradient  of  an  objective  function  to  be  minimized  in  the  null  space  of 
V.  Therefore  define  the  steepest  descent  step  in  the  new  null  space 
to  be 

Zs,  -  -ZZT(Vle  +  v  )  . 

7  J 
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i 


ir 


(Here  v  has  a  positive  sign  since  (tt  )  >  1.)  We  then  have  the 
J  J 

following  result  relating  the  estimate  it^  to  the  first-order  feasi¬ 
bility  of  Zs.. 

Z 

THEOREM  2.  Assume  V  has  full  rank.  If  (^c)j  >  1»  then 

«T~ 

v  Zs.  >  0  . 

3  z 


Proof .  We  have  the  least  squares  characterization: 


VlTC  “  -(I  ~  ZZT) V£e  . 

Thus 

(zTVj)(wc)j  «  -zTVZe  . 


By  definition  of  Zs,  we  have: 

Z 


A  rn  ••  A  rp 

v.Zs.  ■  v*[Z  z]  s. 

3  Z  3  Z 

(VEe  +  Vj) 

-  -(v^z)  (zTVZe  +  zTVj) 

-  -(v*z)2  (1  -  (w^j)  >  0  . 

(The  fact  that  V  has  full  rank  implies  that  v*Z  +  0  and  hence 

Vj*  *  0.)  □ 

It  follows  from  Theorem  2  that  setting  p  -  Zs.  defines  a 

Z 

vector  p  which  is  first-order  feasible  w.r.t.  the  deleted  constraint 
£ 


"  -  [o  vjz] 
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u.  >  -f.(x),  since  p  is  first-order  feasible  w.r.t.  the  retained 
J  J  b 

active  constraint  u  f.(x)-  Note  that  (tt  ) .  being  >1  is 

J  J  t-  j 

equivalent  to  (X  )  . ,  the  multiplier  corresponding  to  the  deleted 
L  J 

constraint,  being  negative.  Clearly  if  (it  )  .  <  -1  and  hence 

v  J 

(X  )  <  0  then  the  deleted  constraint  would  be  u.  2  f.(x),  the 

U  J  J  J 

"X _ 

steepest  descent  step  would  be  -ZZ  (Vie  -  vj)»  and  this  would  have 
a  negative  inner  product  with  v^ . 


X^:  The  least  squares  estimate  in  the  larger  space. 

Define  X^  to  be  the  least  squares  solution  to  (4.6).  It  is 
worth  emphasizing  that  although  the  signs  of  the  components  of  X. 

L 

determine  the  feasibility  of  steepest  descent  steps  for  ELP  in  the  null  space 
of  A  with  a  column  deleted  (see  [10]  and  [15]),  the  estimate  XL  is  not 
relevant  to  the  algorithm  we  describe  for  solving  the  problem.  This  is 

because  (unlike  in  the  minimax  case  of  [15]),  it  is  the  range 
and  null  spaces  of  V,  not  A,  which  determine  the  search  direction  at 
each  iteration.  Unlike  the  minimax  case,  XL  and  Xc  are  not  scalar 
multipliers  of  each  other.  It  will  often  be  the  case  that  a  component 
of  Ac  is  negative  while  the  corresponding  component  of  X^  is 
positive,  indicating  that  if  the  corresponding  constraint  is  deleted, 
the  steepest  descent  step  (with  respect  to  ELP)  in  the  new  null  space 
in  the  (n+rn) -dimensional  space  will  not  be  feasible  with  respect  to 
the  deleted  constraint,  while  the  steepest  descent  step  (with  respect 
to  QP2)  in  the  new  null  space  in  the  n-dimensional  space  will  be 
first-order  feasible.  The  converse  is  much  less  likely  to  happen. 


i.e.,  where  a  component  of  is  positive  while  the  corresponding 

component  of  X^  is  negative,  but  it  is  possible  to  construct  such 
an  example. 

Quite  apart  from  its  deficiencies,  the  estimate  X.  is  more 

Li 

expensive  to  compute  than  X^,,  since  it  involves  the  factorization  of 
a  larger  matrix.  If  m  is  large  compared  to  n,  then  the  additional 
effort  may  be  prohibitively  high.  There  is  a  slight  simplification 
however:  since  the  last  t  rows  of  A  have  full  rank  and  are 
orthogonal  to  the  others,  it  follows  that  the  last  t  equations  in 
(4.6)  must  hold  exactly  and  hence  a  least  squares  problem  of  slightly 
reduced  dimension  can  be  solved. 

Clearly  it  is  undesirable  to  compute  XT  and  we  will  not 
discuss  this  estimate  any  further. 

Pw:  A  Second-Order  Estimate 

A  second-order  multiplier  estimate  can  be  defined  as  the  solution 
to  the  consistent  set  of  overdetermined  equations 

%  =  8  +  Ve  • 

The  fact  that  the  system  is  consistent  implies  that  uw  can  be  obtained 
from  solving 

Vn  -  -VEe  -  Wp 

and  there  is  no  concern  about  solving  the  larger  system.  A  negative 
component  of  y^  does  not  guarantee  that  either  a  steepest  descent  or 
Newton  step  will  be  first-order  feasible  with  respect  to  the  deleted 


constraint. 
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Using  the  estimate 


to  define  W 


IT 

c 

Both  the  estimates  A^.  and  uw  will  be  used  to  decide  when 

to  delete  functions  from  the  active  set.  Since  a  function  corresponding 

to  | (it  )  |  >  1  will  not  necessarily  be  deleted  from  the  active  set, 
t  J 

we  note  here  that  we  use  tt_  to  define  W  as  follows: 

L 


where 


t  2„  m-t 

W  -  l  f  +  l  o7  f 

i=l  i=l 


if  (Vi  <  -1 
if  (irc).  >  1 
otherwise. 


5.  Properties  of  Solution  of  QP  Subproblem 

In  this  section  we  examine  the  properties  of  the  solution  to 

QP2.  Initially,  we  assume  that  all  functions  with  zero  value  are 

-  T 

included  in  the  active  set,  and  that  V  has  full  rank  and  Z  WZ  is 
positive  definite  so  that  the  solution  p  is  given  by  (3.3),  (3.4)  and 
(3.5)  is  unique.  The  corresponding  solution  to  QP1  is  p  ,  where 
p  and  p  are  given  by  (3.1)  and  (3.2).  We  would  like  p£  to 
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satisfy  (2.2),  (2.3)  and  (2.5).  Clearly  the  constraints  of  QP1  ensure 

that  (2.3)  and  (2.5)  hold.  Thus  the  only  question  is  whether  p  is 

E 

a  descent  direction  for  ELP,  i.e.  whether  (2.2)  holds.  If  all  the 
active  functions  have  the  value  zero,  then  the  following  applies: 

THEOREM  3.  Suppose  that  f  *  0,  V  has  full  rank  and  ZTWZ  is  positive 
definite.  Then  p  ,  the  solution  to  QP1,  is  a  descent  direction  for 

L 

ELP  provided  it  is  not  zero. 

Proof.  We  have 

gTpE  -  eTp  +  eTp  =  eTIVTp  by  (3.1)  and  (3.2)  . 

Since  f  «■  0  we  have  «  0  and  p  =  Zpz  as  defined  by  (3.5).  Thus 

gTpE  -  iTJvTZpz  *  -pz(ZTWZ)pz  by  (3.5)  . 

T  T 

Since  Z  WZ  is  positive  definite,  g  p  must  be  negative  if  p  j*  0, 

£  Z 

i.e.  p  +  0.  o 

T _  (k) 

If  p  *  0,  then  by  (3.5)  Z  VZe  ■  0  and  x  is  a  minimum  on 

the  manifold  defined  by  the  current  active  constraints,  and  hence  (4.6) 

(k) 

is  a  consistent  set  of  equations  with  A  -  A  .  Thus  either  x  is 

v  Li 

the  required  solution,  or  one  of  the  components  of  A  is  negative  or 
zero,  i.e.  I  (irc)  |  1  for  some  j. 
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If  p  =  0,  and  at  least  one  of  the  multipliers  is  negative, 

i.e.  I  (it  )  .  |  >  1,  then  it  is  necessary  to  delete  a  corresponding 
u  J 

function  from  the  active  set  to  obtain  a  descent  direction.  If  p  =  0, 

and  the  component  of  with  largest  modulus  is  +  1,  corresponding 

00 

to  a  zero  multiplier,  then  x  may  or  may  not  be  a  solution.  We 
refer  to  Gill  and  Murray  [10]  for  the  treatment  of  zero  multipliers. 


5.1.  Nonzero  Active  Functions 

In  practice  it  will  rarely  be  the  case  that  f  =  0,  so  we  now 
drop  this  assumption.  If  we  were  sufficiently  restrictive  in  the 
definition  of  the  active  set  (e.g.  no  active  functions)  we  could  force 
this  condition  to  be  true,  but  it  is  important  for  the  efficiency  of 
the  algorithm  not  to  be  too  restrictive  in  the  choice  of  active  set. 

It  could  then  happen  that  p  is  an  ascent  direction  for  ELP .  It  is 
now  necessary  to  introduce  some  further  notation.  We  denote  the 
components  of  p„  which  correspond  to  the  orthogonal  n-vectors  Yp„ 
and  Zpz  by  p£Y  and  p£Z  respectively.  More  specifically,  we 
define: 


•  YPy 

'  ZPZ 

T3 

3 

II 

zvtypy 

and  PEZ  " 

ivTzpz 

-  -Ef 

0 

We  have  P£  -  Pgy  +  PEZ- 
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Without  the  assumption  that  f  =  0,  both  p  v  and  p  could 
be  ascent  directions  for  ELP.  If  either  component  is  a  descent  direction 
it  is  possible  to  take  a  weignted  combination  as  the  search  direction. 

T _ 

Also,  if  Z  Vie  ^  0,  the  component  p£z  can  be  replaced  by  the  follow¬ 
ing,  which  must  be  a  descent  direction: 

"  Zc*z 

— T 
IV  Zqz 

0 

where  Zqz  is  the  solution  to  QP3  and  q is  given  by  (3.6), 

T - 

When  Z  Vie  =  0  and  p„v  is  an  ascent  direction  for  ELP,  it 

EY 

is  necessary  to  delete  a  function  from  the  active  set  to  obtain  a 

descent  direction.  It  is  foolish  to  simply  delete  the  active  function 

with  the  largest  absolute  value.  The  following  theorem  shows  that, 

in  this  situation,  a  constraint  can  always  be  found  with  a  negative 

multiplier  estimate.  The  interpretation  of  p  being  an  ascent  direction, 

EY 

•p _ 

regardless  of  the  magnitude  of  ||  Z  VIe||  ,  is  that  too  many  functions 

have  been  selected  to  be  active  and  are  being  forced  to  be  approximately 
(k) 

zero  at  x  +  p,  thus  forcing  the  inactive  functions  to  increase  in 

modulus  more  than  the  active  ones  are  decreasing.  Therefore  the  follow- 
ing  is  also  useful  when  Z  Vie  0. 

THEOREM  4.  Assume  V  has  full  rank  and  let  p„„  be  defined  by  (5.1). 

EY 

If  g^p  >  0,  then  for  some  j  either  f  >  0  and  (it  ).  >  1,  or 
EY  j  w  j 

f j  <  0  and  (*c)j  <  -1- 
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T 

Proof.  It  follows  from  g  p„„  >  0  that 

LY 

-X — T  ~ 

e  £V  Yp  -  e  If  >  0  . 

A  characterization  of  the  solution  of  the  least  squares  problem  (4.7)  is 

Vttc  =  -YYTVEe  . 

Multiplying  both  sides  by  YpY,  we  have  from  (3.4)  that 

*J.f  -  eZVAYpY 

and  hence 

vt  >  e  If  . 

V 

Therefore  for  some  j,  (tt  )  f  >  |f  |  and  the  result  follows.  □ 

^  J  J  J 

T 

It  also  follows  from  the  above  that  If  g  p  *  0,  then  either 

Ex 

f  «  0  (covered  by  Theorem  3)  or  the  maximum  element  of  in  modulus 

is  one  (the  minimum  multiplier  estimate  is  zero) .  It  is  worth  noting 

that  Theorem  4  does  not  hold  in  general  if  other  multiplier  estimates 

such  as  XT  and  p  are  substituted  for  X_. 

L  W  t 

It  follows  from  Theorem  4  in  conjunction  with  Theorem  2  that 

if  p  v  is  an  ascent  direction  we  can  delete  the  active  constraint 

corresponding  to  a  negative  component  of  to  obtain  a  first-order 

feasible  direction.  As  in  [15],  we  will  not  actually  take  the  steepest 

descent  direction  Zs.  in  the  new  null  space.  Instead,  we  will  first 

Z 

try  computing  the  Newton  step  Zq„,  defined  by  (4.8)  and 

Z 


(ZTWZ)q_  -  -ZT(VEi  +  sgn(w  )  v  ) 

Z  ^  J  J 

where  the  j—  active  function  was  deleted.  If  Zq_  is  first-order 

Z 

feasible  with  respect  to  the  deleted  constraint,  we  take  as  our  search 

direction  yYp-  +  Zq_  for  some  y,  0  <_  y  <_  1,  where  Y  spans  the  range 

Z  th 

of  V  and  p^  is  defined  by  removing  the  j — -  equation  from  (3.4). 

Otherwise  we  replace  Zq.  by  Zr_,  where 

Z  Z 

qZ 

-zT(VIe  +  sgn(it^)^Vj) 

q  is  given  by  (3.6)  and  z  by  (4.8).  The  proof  that  Zr„  is  a 
L  Z 

descent  direction  and  is  first-order  feasible  combines  the  methods 

of  proof  of  Theorem  5  of  [15]  and  Theorem  2  above  in  a  straightforward 

way,  so  we  do  not  present  it  here. 

We  do  not  wish  to  minimize  on  manifolds  but  to  delete  constraints 

early  when  the  multiplier  estimates  become  sufficiently  reliable.  The 

above  discussion  of  the  step  to  take  after  deleting  a  constraint  applies 

T 

to  this  situation  as  well  as  to  the  situation  when  g  p^  >  0.  Both 
X  and  y  are  computed  when  possible  so  that  they  can  be  compared. 

v  W 
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5.2.  Avoiding  a  Rank-Deficient  Jacobian  or  an  Indefinite  Projected  Hessian 


The  methods  used  to  avoid  a  rank-deficient  matrix  V  or  an  indefi- 
T 

nite  matrix  Z  WZ  are  identical  to  those  used  in  [15].  In  the  former  case, 
instead  of  ordering  the  potential  columns  of  A  by  the  size  of  (c^) 
we  order  the  potential  columns  of  V  by  the  size  of  { | f  | }  in  the  QR 
factorization  of  the  Jacobian. 


6 .  Quasi-Newton  and  Finite  Difference  Approximations  to  the  Hessian 

In  practice  we  may  wish  to  use  a  quasi-Newton  or  finite 

difference  approximation  to  W  rather  than  the  analytical  Hessians 
2 

{V  f^}.  The  main  difference  between  the  and  minimax  cases  in  this 

regard  is  that  for  the  case,  W  involves  Hessians  of  all  the  m 

functions  {f^},  while  in  the  minimax  case  W  involves  the  Hessians 
of  only  the  t  active  functions.  Thus  a  finite  difference  approxi¬ 
mation  to  W  requires  extra  gradient  evaluations  of  all  m  functions 
at  each  iteration  instead  of  just  those  t  which  are  active.  Since 
for  many  applications  (particularly  arising  from  data  approximation) 
m  is  much  larger  than  n  and  hence  t,  this  means  that  finite 
difference  approximations  may  be  considerably  more  expensive  than  a 
quasi-Newton  approach  for  an  problem,  while  a  finite  difference 

approximation  may  be  more  efficient  for  a  comparable  minimax  problem. 
Note,  however,  that  the  extra  evaluations  of  each  gradient  need  only 
be  done  n-t  times,  along  each  column  of  Z. 
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7.  Determining  the  Steplength 

We  use  a  special  steplength  algorithm  tailored  to  the  problem 

to  obtain  the  steplength  o  at  each  iteration.  This  algorithm  is  pre¬ 
sented  in  [14].  The  initial  step  oQ  is  set  to  either  one,  or  the 
shortest  estimated  step  to  a  zero  of  an  inactive  function,  if  this  is 
less  than  one.  Thus 


where 


°q  -  min{ 1,Qq} , 


1  “f , 

f 

min 

1  i 

1  -T 

v^p  +  0  and  <  0 

1  vlP 

V 

8.  Selecting  the  Active  Set 

Currently  we  use  the  same  active  set  strategy  as  that  described 
for  problem  Jl^P  in  [15],  with  some  slight  modifications,  as  follows. 

We  define  the  scaled  function  values  by  c^  -  (mlf^h/F^.  Since  there 

may  be  no  active  functions  the  first  decision  is  whether  to  include 

the  smallest  one  (in  absolute  value)  in  the  active  set.  This  decision 

is  made  in  the  same  way  as  the  decision  of  whether  to  include  a  second 
active  constraint  in  the  minimax  case,  replacing  the  gradient  v^  by 
the  gradient  of  the  function  when  no  functions  are  active,  i.e. 

Vie. 
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9.  Relationships  to  Other  Algorithms 


We  begin  this  section  by  discussing  algorithms  for  problem 
ijP  when  the  functions  {f^  are  linear.  The  equivalent  problem  ELP 
is  then  a  linear  programming  problem,  although  it  is  not  in  standard 
form.  The  connection  between  the  linear  problem  and  linear  pro¬ 

gramming  (although  using  a  different  formulation  from  ELP)  was  observed 
by  Charnes,  Cooper  and  Ferguson  [6].  Since  then  a  large  number  of 
different  methods  have  been  proposed  for  solving  the  linear  l  problem 
by  various  linear  programming  formulations.  References  to  many  such 
methods  may  be  found  in  [2,3,4,19].  Probably  the  most  widely  used 
method  is  that  of  Barrodale  and  Roberts  [3],  which  solves  a  variation 
of  ELP  put  in  standard  form  by  a  primal  simplex  method  taking  account 
of  the  special  structure.  An  alternative  approach  taken  by  Claerbout 
and  Muir  [7]  and  Bartels,  Conn  and  Sinclair  [5]  is  to  solve  the  problem 
directly  by  minimizing  the  piecewise  linear  function.  However,  it  is 
possible  to  think  of  these  methods  as  linear  programming  methods  applied 
to  ELP,  by  considering  the  connection  between  the  vector  w  in  Bartels, 
Conn  and  Sinclair  (which  corresponds  to  ir  in  our  nonlinear  algorithm) 
and  the  simplex  (Lagrange)  multipliers  of  ELP,  just  as  we  do  for  the 
nonlinear  problem.  This  is  the  approach  we  prefer  since  it  retains 
the  familiar  linear  programming  terminology  while  avoiding  transforming 
ELP  to  standard  form. 

A  completely  different  approach  to  the  linear  problem  is 

to  solve  a  sequence  of  weighted  least  squares  problems  as  the  Lawson 
algorithm  (see  [17])  does  for  the  linear  Jl^  problem.  This  was  suggested 
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by  Schlossmacher  [18]  but  Gallant  and  Gerig  [9]  show  that  this  method 
can  be  unstable. 

It  is  somewhat  surprising,  given  the  number  of  linear 
algorithms,  that  there  has  been  comparatively  little  work  done  on 
the  nonlinear  problem.  Osborne  and  Watson  [16]  solve  the  nonlinear 

problem  by  solving  a  sequence  of  linear  problems.  The  solution 

to  each  linear  problem,  obtained  by  a  linear  programming  technique, 

is  used  as  a  search  direction  along  which  the  minimum  of  is  found 

by  an  exact  line  search.  We  are  aware  of  only  two  published  methods 
for  the  nonlinear  problem  which  use  second-order  information,  both 

of  which  appeared  only  recently.  El-Attar,  Vidyasagar  and  Dutta  [8] 
suggest  a  method  related  to  the  penalty  method  for  nonlinear  programming 
in  which  a  sequence  of  increasingly  ill-conditioned  unconstrained 
optimization  problems  are  solved.  McLean  and  Watson  [13]  propose  both  a 
first-order  Levenberg-type  of  method  similar  to  those  of  [1,12]  for 
the  minimax  problem,  and  a  method  which  uses  second-order  information. 
The  latter  is  a  two-stage  method  similar  to  that  of  Watson  [20]  for 
the  minimax  problem,  in  which  successive  linear  programming  problems 
are  solved  until  it  is  thought  that  the  active  set  has  been  Identified, 
whereupon  a  switch  is  made  to  solving  a  system  of  nonlinear  equations 
by  Newton's  method.  The  system  has  order  n+t,  since  the  variables  and 
multipliers  are  obtained  together.  Since  t  may  often  be  close  to  n 
(t  equals  n  in  the  linear  case),  the  systems  of  equations  which  are 
solved  may  be  much  larger  than  the  ones  we  solve. 


We  are  not  aware  of  any  published  methods  of  the  projected 
Lagrangian  type  for  problem  i^P,  although  we  understand  that  Bartels 
and  Conn  are  currently  doing  some  related  work.  It  would  be  possible 
to  construct  a  method  related  to  ours  but  which  solves  an  inequality- 
constrained  quadratic  program  variant  of  QP1  at  each  iteration  as  Han 
[11]  does  for  the  minimax  problem.  However,  such  a  QP  has  n+m  variables 
and  it  is  not  possible  to  transform  this  directly  to  an  inequality- 
constrained  QP  in  n  variables  (as  we  transform  QP1  to  QP2) .  It  would 
be  necessary  to  solve  the  inequality-constrained  QP  by  a  special-purpose 
method  taking  into  account  the  special  structure,  just  as  the  Bartels, 
Conn  and  Sinclair  method  essentially  solves  the  linear  program  equivalent 
of  ELP  by  a  special-purpose  method.  See  [15]  for  remarks  concerning  the 
relative  merits  of  solving  the  equality  and  inequality-constrained  QP's. 

The  remarks  on  asymptotic  local  quadratic  convergence  made  in 
[15]  for  the  minimax  problem  carry  over  without  difficulty  to  our 
algorithm  for  i^P. 


10.  Computational  Results 

We  present  the  results  of  applying  the  algorithm  to 
problems  with  the  same  definitions  of  (f^(x))  as  the  first  four 
problems  presented  in  [15].  The  solutions  obtained  are  listed  below. 
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Problem  1. 


Fx(x)  -  0.12434  with  x  -  (0.10094,  1.52515,  1.97211)1  . 

Problem  2. 

F^x)  -  0.0038768  with  x  *  (0.19337,  0.19377,  0.10893,  0.13973)1 
Problem  3. 

F^x)  -  1.00000  with  x  -  (0.0000,  0.0002)T. 

Problem  4. 

F1(x)  -  7.8942  with  x  *  (0.53597,  0.00000,  0.031918)1. 

The  results  are  summarized  in  Table  1.  The  termination  conditions 
were  that  || f  || ^  <  10  **>  HzTVEe||  <  10  ZTWZ  numerically  positive 
definite  and  ^  >  0.  The  line  search  accuracy  parameter  n  was  set 
to  0.9  (see  [14]  for  the  definition  of  this  parameter).  Several  other 
choices  of  n  were  tried,  but  n  *  0.9  was  the  most  efficient,  indi¬ 
cating  as  expected  that  a  slack  line  search  is  desirable  at  least 
on  these  problems.  The  machine  used  was  an  IBM  370/168  in  double 
precision,  i.e.  with  16  decimal  digits  of  accuracy.  The  column  headed 
NI  reports  the  number  of  iterations  required,  which  is  also  the  number 
of  times  the  Hessian  was  approximated  using  finite  differences.  The 
column  headed  NF  gives  the  number  of  function  evaluations  (not  including 
gradient  evaluations  for  the  Hessian  approximation) . 
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TABLE  1 


Problem 

n 

m 

* 

n-t 

NI 

NF 

1 

(Bard) 

3 

15 

0 

20 

20 

2 

(Kowalik  and  Osborne) 

4 

11 

0 

11 

14 

3 

(Madsen) 

2 

3 

0 

15 

15 

4 

_ 

(El-Attar  et  al.  #2) 

3 

6 

2 

10 

11 

These  results  demonstrate  that  our  algorithm  can  be  very  efficient 
Final  quadratic  convergence  was  observed  in  all  cases.  The  results  must 
however  be  regarded  as  preliminary  since  further  work  needs  to  be  done 
regarding  the  active  set  strategy. 


11.  Concluding  Remarks 

The  nonlinear  optimization  problem  has  been  shown  to  be  as 

tractable  as  the  nonlinear  minimax  problem  using  a  projected  Lagrangian 
algorithm  closely  related  to  that  of  [15].  Although  the  nonlinearly 
constrained  optimization  problem  which  is  equivalent  to  involves 

m  extra  variables,  we  have  shown  how  to  derive  a  method  which  solves 
successive  quadratic  programming  problems  in  only  n  variables.  The 
different  roles  of  multiplier  estimates  and  directions  of  search  in 
the  (n+m)-  and  n-dlmensional  spaces  have  been  emphasized. 


We  could  repeat  many  of  the  concluding  remarks  of  [15]  here. 

In  particular  we  observe  that  linear  constraints  can  be  incorporated 
into  the  algorithm  but  that  nonlinear  constraints  increase  the 
complexity  of  problem  f^P  to  that  of  the  general  nonlinear  constrained 
optimization  problem.  In  summary,  the  method  of  this  paper  has  been 
designed  to  take  advantage  of  all  the  special  properties  of  the  Jl^ 
problem  which  are  not  available  for  general  constrained  optimization 
problems . 
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